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1 Introduction 



In numerical integration, the magnitude of the integration error depends on two 
things: 

1. The fluctuating behaviour (variance, variation) of the integrand. This be- 
haviour may be improved by so-called variance-reduction techniques, which 
we shall not discuss in this paper (see, for example, ||). 

2. The quality of the point-set employed for the integration. We shall see that 
the measure of quality will be the uniformity of the point-set. This paper 
deals with the uniformity properties of point sets for numerical integration: 
in particular, we discuss various definitions of uniformity, their relation to 
typical integrands encountered, some of their theoretical properties, and 
the actual performance of algorithms (quasi-random number generators) 
claimed to yield very uniform point sets. 

The outline of this paper is as follows: in section 2, we discuss the traditional 
Monte Carlo approach to numerical integration and relate the relatively slow 
convergence of the error, as 1/yN (where N denotes the size of the point set), 
to the inherent non-uniformity of the truly random points, used in classical 
Monte Carlo. This is made explicit by the notion of classical discrepancy of the 
point set; we quote several theorems relating this discrepancy to the integration 
error. From these it is seen, that low-discrepancy point sets lead to small errors. 
In section 3, we describe a number of algorithms that have been proposed for 
constructing quasi-random sequences, that lead to point sets with a discrepancy 
smaller than that expected for truly random point sets. In general, for these 
algorithms, there exist discrepancy bounds, that are valid only in the limit N — > 
oo. (Their observed behaviour for finite N is reported in section 6.) In section 4, 
we point out that the definition of classical discrepancy is intimately connected 
with the underlying, implicit, model for the type of integrand attacked; this 
function class is governed by the Wiener measure, which may or may not be 
appropriate, depending on the context of the integration problem. We describe a 
number of alternative choices of function class, each leading to its own definition 
of discrepancy. In section 5, we argue that the merit of a 'low-discrepancy' 
sequence can only be judged in relation to a truly random point set. To this 
end, one has to know the probability distribution of a given discrepancy, viewed 
as a stochastic variable through its dependence of random points. We derive 
several results in this direction. Finally, in section 6, we compute the classical 
discrepancy for a number of quasi-random sequences for moderately large point 
sets (N up to 150000), and moderately large dimension (up to 20). 

2 Integration and discrepancy 
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2.1 Numerical integration 

The problem we shall be considering is the estimation of the multidimensional 
integral 

J = J dn(x) f{x) , (1) 

K 

where x = = (x , x 2 , . . . , x s ) denotes a point in the s-dimcnsional integration 
region K. Throughout this paper, Greek indices like fj, will denote individual 
coordinates. The integrand is f(x), and d/z(x) denotes a measure on K, that is, 
we must have 

J d»(x) = l , (2) 

K 



and dfi(x) must be positive. We shall only consider measures that allow a prob- 
abilistic interpretation, that is, there exists, in principle, a computer algorithm 
for turning a sequence of truly random numbers into a series of points x such 
that their probability density is given by d^i(x): 

Prob(x e A) = j dfj,{x) , (3) 

A 

for all small rectangular regions A. 

We estimate J using the unweighted sum: 

k 

where the Xk constitute a point set, that is, a finite set of TV points Xk, k = 
1,2, ... ,N obtained in one or another way. Note that Latin indices like k run 
over the points in the point set and the sum is always understood to run from 
1 to N. The error made in using this estimate is clearly 

n = S-J . (5) 

The object of this paper is to find point sets Xk which can be used to estimate 
the integrals of general classes of functions / with small error n. 

We restrict the choice of integration regions K to the unit hypercube I s = 
[0, l) s since this is the most common and best understood region, but we should 
point out that the properties of many integration methods may depend strongly 
on the shape of the integration region. 

Finally, throughout this paper we shall adopt the real-number model of 
computation, that is, we shall disregard all kinds of small corrections related 
to the finite word size of our computers, or the fact that the numbers we use 
cannot be truly irrational: we assume that our computers have enough precision 
to approximate real numbers for all practical purposes. 
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2.2 Monte Carlo integration: the Central Limit Theorem 



In the Monte Carlo method the point set is taken to be a sample of random 
points, independent and identically distributed (iid) with probability density 
dfi(x). The error n is then also a random quantity, with some probability density 
P{rj). This is most easily studied in the form of a moment- generating function 
G: with M — \/~N, we can write 



G{Mz) = ( e Mzr > ) 



-MzJ 



n 



,zf(x k )/M 



(6) 



where the brackets denote an average over all point sets consisting of N points 
chosen with the measure d/i(x) as a probability density. Since the points Xk are 
random, they are independent, and we can easily take the average: 



,zf(x k )/M 



n>0 ' K 



For large enough N, the following approximation is justified: 

N 



G{Mz) = 



-z.J/M 



E 

ra>0 



n\M n 



1 + ^(W 2 ) + ^(J 3 -3J 2 J + 2J3) 



N 



= exp^-(J 2 -^)j(l + 0(l/M)) . 
Up to terms of order 1/M, we may compute P(rj) by Fourier transform: 



(7) 



(8) 



P(V) = ^ / ' 



V = J 2 -J 2 



zr >G(iz) 



^2irV/N 



exp 



-Nif 



(9) 



This is the Central Limit Theorem: the error is normally distributed around 
zero, with standard deviation \JV/N ; V is called the variance of the integrand: 
it is positive for every f(x) that is not constant over K. We note the following 
aspects of the above derivation. 

• The Monte Carlo estimate is unbiased: the expectation value of the error 
is zero (this holds also for finite N). 

• The error distribution attains a simple form owing to the 'large- N approx- 
imation', that is, we neglect terms of order 1/N. 



4 



• The result for the expected square error is rigorous: but its use as an 
estimate for the error made using this particular point set implies that we 
postulate this point set to be a 'typical' member of the ensemble of point 
sets governed by the underlying measure dfi(x). 

• The error estimate does not depend on the particularities of the point set 
actually used, since we have integrated over all such point sets. 

• The error estimate does depend on the particularities of the integrand, 
namely its variance, which is a measure of the amount by which it fluctu- 
ates over K. 



• Any quadratically intcgrable function can be integrated by Monte Carlo 
but the convergence of the error to zero is (only) as l/VN, and additional 
smoothness properties, such as continuity or differentiability only lead to 
smaller error inasmuch as they lead to a smaller V. 



2.3 Classical discrepancy 

In the previous section we have established the convergence of classi- 

cal Monte Carlo, but it is known that, in one dimension, even the unweighted 
quadrature method known as the trapezoid rule, where the points are equidis- 
tantly distributed, gives a l/N 2 convergence. Clearly, the irregularities in the 
random, rather than equidistant, point set play a role in the behaviour of the 
error. We now introduce the classical discrepancy, which will allow us to quan- 
tify such irregularities, not only in one dimension, but also in higher dimensions 
where the trapezoid rule is outperformed by Monte Carlo. 

Let K be the s-dimensional hypercube I s , and y be a point in K. We also 
assume that dfi(x) is the usual Cartesian measure on K: random points under 
this d/j(x) have a uniform distribution over K. We define the following counting 
function: 

x(y;z) = n% M -^) . ( 10 ) 

which simply checks if the point x is 'below y\ i.e. inside the hyper-rectanglc 
defined by the origin and y. The local discrepancy at y, for the point set Xk, is 
then 

ff(y) = ^£x(»;a*)-nf" • ( n ) 

k A 1 

The function g(y) counts the fraction of the point set is below y, and compares 
this with the volume of K that is below y. From a slightly different point of 
view, g(y) is nothing but the integration error made in estimating the volume 
below y using the given point set. Clearly, the more uniformly distributed the 
point set, the smaller g(y). This suggests the notion global discrepancy, that 
must inform us about the global deviation of g{y) from zero. There exist various 



5 



such quantities. Let us define 

D m = J dyg(y) m . (12) 
K 

Then, useful measures of global discrepancy are D\ (linear discrepancy), D 2 
(quadratic discrepancy), and the extreme, or Kolmogorov discrepancy: 

#oc = lim (D 2k ) 1/2k = sup \g(y)\ . (13) 

This last discrepancy (called D* in ||), has been the most widely studied, but 
D 2 appears to be easier to evaluate for a given point set (as we shall see), and 
may be more relevant as well. 

We shall derive some results on the discrepancies for random points later 
on, but at this point we may already quote the various expectations for truly 
random point sets: 

(A) = 0, <^> = ^Q^) , i^) S -^ S 2 . (14) 
2.4 Results for integration errors 

It should be intuitively clear that the size of the discrepancy of a point set 
must relate to how good that point set is for numerical integration. This has 
indeed been studied extensively; see, for example [|| from which we give some 
important results. 

In the first place, if a point set has extreme discrepancy equal to D, every 
rectangular region (with edges parallel to the axes) of volume 2 S D or larger 
must contain at least one pointj^: the largest rectangular 'gap' (with no points) 
has size 2 s D OQ . The discrepancy is seen to measure the 'resolution' of the point 
set. Secondly, let us define the 'extremum metric' d on K by 

d(x,y)= sup \x*-y»\ , (15) 

for any two points x and y in K. The modulus of continuity M of the integrand 
is then the following function: 

M(z) = sup \f(x) - f(y)\ , tor d(x,y)<z . (16) 

x,y 

That is, the modulus of continuity tells us what is the maximum jump in the 
value of /, if we make a step of maximal size z. Then, when using the point set 
to estimate J by the quantity S, the error 77 obeys 

M<4Af(Z^ s ) ; (17) 

1 The factor 2 s is due to the restriction, in the definition of discrepancy, to hyper-rectangles 
touching the origin. 
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the appealing result that the error depends on the maximal jump in value, for 
steps of half the size of the edge of the hyper-cubic 'gap' that corresponds to 
the resolution of the point set. 

Although this result is only valid for continuous functions, for which M(z) — > 
as z — > 0, we may always approximate a discontinuous function by a continuous 
one: but then, M(z) will no longer vanish with z, and we have a finite error 
bound even for zero discrepancy. 

A large number of other, similar results are given in 0, for instance the 
Koksma-Hlawka bound which relates the error to the variation of the integrand, 
rather than to its variance: but, since for discontinuous functions the variation 
is in many cases infinite even when the variance is finite, such a bound may 
be of limited value; moreover, quantities like the modulus of continuity or the 
variation are in practice very hard to compute exactly Nonetheless, these results 
teach us that the lower the discrepancy, the smaller we can expect the integration 
error to be. 

2.5 Quasi-random Sampling for Simulation 

Formally at least, most simulations are in fact equivalent to numerical integra- 
tions, since one is usually looking for the expectations of particular variables or of 
distributions of variables and these expectations are simply integrals over some 
subspaces (histogram bins) of the full sampling space. However, in practice, 
additional considerations may be important. In simulations of particle physics 
experiments for example, the dimensionality as defined by random numbers per 
event tends to be extremely high and often is not even the same from event to 
event. Clearly the results of this paper are not then immediately applicable, but 
one somehow suspects that there must nonetheless be a way to take advantage 
of the improved uniformity of quasi-random points in such simulations. 

A promising way to approach this problem is to decompose the full simula- 
tion of an event into manageable parts, some of which may then be of reasonably 
low fixed dimension. For example, the physical event must be generated, and 
independently some of the particles may interact or decay, and independently 
they may produce detector hits, showers, etc. It may be advantageous to use 
quasi-random points in the simulation of one or more of these sub-processes. 

Another difficulty in simulation is that the detector boundaries or decision 
boundaries tend to be discontinuous, which formally makes the problem equiva- 
lent to the numerical integration of a multidimensional discontinuous function, 
and the variation of such a function is generally unbounded. Here again, prac- 
tical experience indicates that the situation is not as grim as all that, and that 
such integrals do in fact converge with a considerably smaller error using quasi- 
random points U . 

3 Low-discrepancy point sets 
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3.1 Finite sets and infinite sequences 

Much effort has been devoted to construct point sets that have lower discrepancy 
than those consisting of random points. We must distinguish between two cases. 
On the one hand, one may consider finite point sets, where N is fixed. In 
principle, for each value of N in the s-dimensional hypercube, there exists a 
point set with the lowest discrepancy of all point sets. That discrepancy must 
be larger than the so-called Roth bound || : 

n„>C ^'- 1)/2 , (is) 

where C s depends only on s. For large N and reasonable s, this lower bound 
is much lower than the expected discrepancy for random point sets but, except 
in one dimension, it is not known how to construct such an 'optimal' set, and 
it is not even known if it is possible to attain this bound in general. The one- 
dimensional exception is the equidistant point set: 

2k — 1 

X * = -ZN~ ' k = 1 > 2 T--> N » ( 19 ) 

or any rearrangement of this set. In higher dimension, Cartesian products of 
this sequence, the so-called hyper-cubic lattice, are not optimal. For, let us 
enumerate a hyper-cubic sequence by 

— 1 



For this point set, we indeed have D\ = 0, but 



£ 2 = I 

3 s 



I ( I H- -2(1 



2M 2 J V 8M 



(21) 



which only goes as N~ 2 / s for large N. This discrepancy is larger than that 
expected for random points if s > 2: and this is the underlying reason why, in 
higher dimension, trapezoid rules generally perform less well than Monte Carlo 
for the same number of points. 

It is in principle possible to use numerical minimization techniques to find 
approximately optimal point sets, but the computational complexity of this 
problem is so enormous that it can be attempted only for very small values of 
both N and s. 

On the other hand, we usually do not wish to restrict ourselves in advance 
to a particular value of N, but rather prefer to find a formula which yields 
a long sequence of points for which the discrepancy is low for all N, perhaps 
even approaching an optimal value asymptotically. This allows us to increase 
the size of our point set at will until the desired convergence is attained. We 
shall, therefore, mainly concentrate on sequences of indefinite length. Their 
discrepancy for any given N will usually be larger than that of an optimal point 
set with the same N; but, 'on the average', we may hope to remain close to the 
Roth bound. 
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3.2 Richtmyer sequences 

Probably the first quasi-random point generator was that attributed to Richt- 
myer and defined by the formula for the uth coordinate of the fcth point: 

xl = [fcSJmod 1 , (22) 

where the 5 M are constants which should be irrational numbers in order for the 
period to be infinite. Since neither irrational numbers nor infinite periods exist 
on real computers, the original suggestion was to take 5 M equal to the square 
root of the /ith prime number, and this is the usual choice for these constants. 

Figure |l| shows how the distribution of Richtmyer points in two dimensions 
evolves as N increases. A very regular band structure develops. The bands then 
become wider until they completely fill the empty space. Then they start to 
overlap and begin to form other bands in other directions which also widen, and 
the process continues with each successive band structure becoming narrower 
and more complex. This kind of behaviour is a general property of quasi-random 
sequences, even those described below which are based on a very different algo- 
rithm. 

This simple quasi-random generator does not seem to have received much 
attention, even though it is the only one we know based on a linear congruential 
algorithm (others are based on radical-inverse transformations), and as we shall 
see below, its discrepancy is surprisingly good. From a practical point of view, 
some potential users have probably been discouraged by plots of points in two- 
dimensional projections which show some long-lasting band structure in some 
dimensions, due of course to an unfortunate combination of constants <S^. In 
this paper we use always the square roots of the first prime numbers, but we 
should point out that it may be possible to find other values of which assure 
a long period and minimize the band structure or even minimize directly the 
discrepancy. This is of course a very compute-intensive calculation. 

3.3 Van der Corput (or Halton) sequences 

An interesting low-discrepancy sequence can be found as follows Let us 
start, in one dimension, by choosing a base, an integer b. Any integer n can 
then be written in base b: 

n = n Q + nib + n 2 b 2 + n 3 b 3 H . (23) 

The radical-inverse transform (to base b) is defined by 

(t> b {n) = nob- 1 + ni b- 2 + n 2 b- 3 + ri 3 b- 4 -\ . (24) 

The van der Corput sequence to base b is then simply the following: 

x k =4> b {k) , fe = l,2,... . (25) 

Note that, as k increases, the digit no changes most rapidly, the digit n\ the 
next rapidly, and so on: so that the leading digit in Xk changes the most rapidly, 
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0.5 

(a) 10000 points 
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0.5 1 
(a) 2000 points 



0.5 1 

(a) 40000 points 



Figure 1: The distribution of Richtmyer points in two dimensions, showing the 
evolution as N increases. 
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the next-to-leading one less rapidly, and so on. It is therefore clear that this 
sequence manages to fill the unit interval (0, 1) quite efficiently. Another insight 
may be obtained from the fact that, when N = b m , for integer m, the set 
X\, x%, ■ ■ ■ , xn will precisely be equidistant, of the type of Eq. ( Jl9| ) , and for that iV 
the discrepancy will be optimal. The discrepancy is therefore optimal whenever 
TV is a power of b. For other values of N, it cannot stray away too much from 
the optimal value, but as m increases it can go further and further out before 
returning: it is therefore no surprise that the discrepancy has a low upper bound: 



log N 4(b+l)logb 

£oc < C b -2— , C b = { (26) 



N 



b-l 
4 log b 



when b is even 
when b is odd 



A generalization of Eq.(|25|) to higher dimensions is obvious: one simply 
chooses several bases b%, 62, . . . , b s , and 

a£ = , fc=l,2,... , (m = 1,2,---,s . (27) 

In figure ||, we show the distribution, in two dimensions, of the first 1000 points 
obtained with b\ — 2, 62 = 3. This distribution is, even visually, smoother and 
more uniform than a typical set of 1000 random points (see figure ||). Clearly 
we will get into trouble if any two bases have common factors, and indeed the 
most common choice (and our choice here) is to take as bases the first s prime 
numbers. However, for dimensions 7 and 8 for instance, we then need bases 
17 and 19, for which the plot is given in the bottom half of figure |[ a lot of 
structure is still visible, and the square is not filled as uniformly. 

This can also be understood easily: the 'recurrence' of the discrepancy to a 
low value, discussed for s — 1, could be encountered here only if N is an exact 
power of each of the bases, and since these are all relatively prime, this cannot 
happen. Numbers that are 'close' to exact powers of all the bases (in the sense 
mentioned above) are guaranteed to be far apart, and hence the discrepancy 
can become considerable. This is in fact the content of the multidimensional 
equivalent of Eq. (|26|) : 



N N 1 ^\2\ogb fl 2 log N 

As a function of s, therefore, the constant factor grows faster than an exponen- 
tial of s, and in high dimensions the superiority of the van der Corput sequence 
over a purely random one will appear only for impractically large N. 



3.4 SoboP and Niederreiter sequences 

Improved discrepancy in higher dimensions can be obtained from the van der 
Corput sequences by making use of additional functions m M (fc), fj, = 1, 2, . . . , s, 
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Figure 2: The first van der Corput points with bases 2 and 3 (top) and bases 
17 and 19 (bottom). 
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and denning a new sequence 

x' 1 k = ct> b {m li {k)) , (29) 

The functions m M (fc) are cleverly chosen, amongst other things in such a way 
that when N = b m , the numbers m M (l), m M (2), . . . , m fl (N) are just a permuta- 
tion of (1, 2, ... , N). In this way, one can attain a constant factor (C s in Eq.(|l8|)) 
that in principle decreases super-exponentially with s. Practical experience with 
these sequences is still limited, however. 

We have investigated empirically the behaviour of two such sequences: the 
SoboP sequence, described in jl^], and the Niederreiter base-two sequence, de- 
scribed in M . The general Niederreiter sequences form a large class of which 
both the above are special cases. 

4 Problem classes and discrepancies 

4.1 A conceptual problem for special point sets 

We have seen how, in classical Monte Carlo, the error estimate is based on the 
assumption that the point set is one out of an ensemble of such, essentially 
equivalent, point sets: we then average over this ensemble. We have also seen 
that the lower a point set's discrepancy, the smaller the error could be. But 
when we try to apply this statement to the integral of a particular function, 
we immediately run into trouble. Namely, low-discrepancy point sets are not 
'typical', they are special. In particular, the points are correlated in a possibly 
very complicated way, so that the simple treatment of Eq. (fy cannot be valid. 
One way out would be to define, in some manner, an ensemble of point sets 
with a given discrepancy. This approach is followed in 0]. 

Another option, which we shall pursue in the following, is to respect the 
specialness of the point set, and instead consider the integrand f{x) to be a 
'typical' member in some set: this we shall call the problem class. An example 
of such a class could be that of all polynomials of some degree, but we shall 
encounter other, more relevant, cases. Then, we may hope, instead of averaging 
over the ensemble of point sets to end up with an error estimate depending on the 
integrand, to integrate (or average) over the problem class: in that case, we will 
end up with an error estimate that is independent of the particular integrand, 
but related to some property of the point set, for instance its discrepancy as 
discussed above. It should be clear that different problem classes will lead to 
different measures of irregularity of the point set: we shall call such a measure 
the induced discrepancy corresponding to the problem class. 

To pursue this line of reasoning, we must therefore first consider what con- 
stitutes a problem class, and this we shall do next. 

4.2 Green's functions and connected Green's functions 

It is our aim to determine the integration error by averaging over all integrands 
in a given problem class. Strictly speaking, we then have to define a measure 
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T>f on this problem class, which defines the probability density for members of 
the class. In fact, we can settle for less: the kind of information we need about 
the problem class is the set of all Green's functions in the problem class: 

, X 2 , ■ ■ • , X n ) — (f(xi)f(x 2 )---f(x n ) } f 

Vf f(x 1 )f(x 2 )---f(x n ) , (30) 



/ 



where this time the brackets denote an average over the problem class: it is 
the 'typical' value expected for the product of integrand values at the points 
xi, X2, ■ ■ ■ , x n . It is, of course, symmetric in all its arguments. Knowledge 
of the Green's functions will be sufficient for our purposes, even if we cannot 
(as in the case of the Wiener measure) write down the form of a particular 
integrand in the problem class in any simple manner. The kind of problem class 
we consider, and therefore the Green's functions, must of course be determined 
by the nature of our integration problem. The restriction to continuous (or 
even smoother) functions, popular in the mathematical literature, is not so 
appropriate in particle physics phenomenology, where experimental cuts usually 
imply discontinuous steps in complicated shapes. 

Even more relevant for our purposes than the Green's functions themselves 
arc the connected Green's functions c n (xi, x 2 , ■ ■ ■ , x n ), which form the irre- 
ducible building blocks of the Green's functions. We have 

9oQ = 1 , 
gi(xi) = ci(xi) , 
g2(xi,x 2 ) = ci(xi)ci(x 2 ) + c 2 (xi,x 2 ) , 
33(^1,^2,2:3) = ci(xi)ci (x 2 )ci (x 3 ) + ci(xi)c 2 (x 2 ,x 3 ) + ci(x 2 )c 2 (xs,xi) 

+ ci(xs)c 2 (x 3 ,xi) + cs{xi,x 2 ,x 3 ) , (31) 
and so on; the following recursive definition holds: 
g n (xi,x 2 , ...,x n ) = 

(£1 

^ (m- 1)! <n-m)\ ' [ ' 

V m=l v 1 v ' 

where (z 2 ,z 3 , . . . , z n ) is a permutation V of (2:2,2:3, • ■ • , x n)'- the first sum runs 
over all (n — 1)! such permutations. The connected Green's functions arc, of 
course, also totally symmetric. The nice thing about connected Green's func- 
tions is that in many cases there is only a finite number of non-vanishing ones; 
for instance, for a Gaussian problem class such as the Wiener sheet measure, 
only c 2 is nonzero: in that case g n is zero if n is odd, and g 2n consists of 
(2n)!/2"n! terms, each a product of n factors c 2 . 

4.3 Induced discrepancy 

We now proceed to derive how the set of Green's functions induces a quantity 
with the general properties of a discrepancy. If the integrand is chosen 'at 
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random' from the problem class, the integration error rj will again be random 
even for a fixed point set, and we can derive its expectation value and moments. 
Denoting averages over the problem class by brackets we have 



my v 



m.p i 



p=0 



k\ tk,2 ,. . . , kp 

■ J d\i{z m ) g m {x kl , Xk 2 , ■ ■ ■ , x kp , z p+ i, z p+2 , ■ ■ ■ , z m ) .(33) 

K 

Now, we express the g m in the connected Green's functions. The combinatorial 
factors are quite involved, but the result can be summarized as follows. The 
moment ( rj m ) consists of a sum of all powers of all possible combinations of 
c's, with each c having as arguments all possible combinations of summed-over 
point set members and integrated-over points in K. Let us define 



ki ,&2 j ■ "jfefc j,^ 



d/j,(z k+2 ) ■ 



■■■ J dfj,(z n ) c n (x kl ,x k2 ,...,x kk ,z k+1 ,z k+2 ,...,z n ) . (34) 

K 

The various a n ^ k with the same n will always occur in the combination 

-1)"~V; 
k\{n-k)\ 



fe=0 

and the m* n moment of r\ is given by 

Jf\ JV1 JV'1 

( V m ) f = m\ y^W^... , (36) 



where the sets of integers {v n } are restricted by v n > 0, and ^i+2^ 2 + 3^3 + - • • = 
m. Note that, whenever c n is a constant, d n is zero. 

We may now introduce the moment-generating function for r\: 



< eZ ">/ = E^ m >/ = ex p £* m M • ( 37 ) 

m>0 ' \m>\ 

In many cases, we may be able to construct the actual probability distribution 
P(rj) for r\ by inverse Mellin transform. We are assured, in fact, that this 
transform always exists - after all, if the integrands have some probability 
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distribution over the problem class, then the integration error also has some 
distribution. 

More important is the realization that, if the problem class is Gaussian in 
the sense that the only nonzero connected Green's function is c 2 , then P{rf) is 
necessarily a normal distribution, and we only have to know ( i] 2 j to know all 
confidence levels. In addition, note that whenever d\ is nonzero, the integration 
will be biased. If, in addition to d 2 , also d± is nonzero, it had better be negative 
- but, if it is positive, the problem class itself is not well-defined. 

What, now, is the use of all this for practical error estimates? This lies 
in the following factorization property: suppose that we can find a function 
h(x;y), with x in K, and y being some parameter in some space L, such that 
the connected Green's function can be expressed in terms of h as follows: 

Cn(xi,x 2 , ■ ■ ■ ,x„) = J dy h(x 1 ;y)h(x 2 ;y)---h(x n ;y) : (38) 

L 

then, we can easily derive that d n has the simple form 

d„ 

H(y) 

The function H{y) is the induced local discrepancy, and d n is a measure of 
induced global discrepancy. H(y) has precisely the form of Eq.(|TT|): it measures 
how well the function h{x; y) is integrated by the point set, for a given value of 
y: but, note that h(x] y) itself is not necessarily a member of the problem class. 
As we shall see, many problem classes have the factorization property. For such 
a problem class, we then have the following strategy available: 

1. determine the Green's functions; 

2. determine the connected Green's function c 2 ] 

3. find a parameter y and a function h(x,y) such that factorization holds; 

4. for a given point set, determine H (y) and d 2 ; 

5. we have then estimated the expected moment ( rj 2 ) for the integration 
error if we use this point set on an integrand picked at random from the 
problem set. 

4.4 Orthonormal-function measures 

We shall now discuss how to apply the above formalism for a special kind of 
problem class. As usual, we start with the one-dimensional case. Let us take 



- / dyH(y) n , 

L 

^^2h(x k ;y) - J dn(x)h(x;y) . (39) 



K 



16 



K = [0, 1], and d^(x) = dx, the usual uniform measure. Furthermore, we define 
a set of functions u n {x), n = 0, 1, 2, 3, . . ., that are orthonormal: 

l 

dx u m (x)u n (x) = S m ,n , Uo(x) = 1. (40) 
o 

As an example, we may take the u m to be orthonormal polynomials, and we 
shall presently discuss some other examples. We then define our problem class 
to consist of all functions that can be written as a linear combination of these 
orthonormal ones: 

f( x ) = E v ^ u ^ x ) > ( 4l ) 

n>0 

so that the coefficients v n completely determine f(x). If our orthonormal set is 
even complete, that is, if 



N 

lim y^u n (x 1 )u n (x 2 ) = S(xi ~ x 2 ) , (42) 

n=l 

we can actually approximate any f(x) to an arbitrary accuracy by members of 
the problem class. 

A measure on the problem class can now be specified by giving the prob- 
ability density of each individual coefficient v n . For simplicity, we take these 
distributions to be normal, and we can write 

Vf=\{dv n -L=e^(-^) , (43) 
„>o V^f V 2 <) 

where a n is the width, which may be different for different v n . Now, the only 
nonzero connected Green's function is c 2 , and we always have the factorization 
property: 

l 

c 2 (x 1 ,x 2 ) = ^2 ^u n (xi)u n (x 2 ) = dy h(x 1 ;y)h(x 2 ;y) , 
h(x;y) = o- n Un(x)u n (y) ■ (44) 
Thus, we can immediately write down the induced discrepancy: 

^ ortho = ^E CT «E u «(^)"™^) . ( 45 ) 

n k,l 

where the caret denotes a sum over all n except n — 0: this is due to the fact that 
Uq(x) = 1 is a constant, and obviously the constant component of an integrand 
is always estimated with zero error by any point set.[] 

2 The quadratic discrepancy D®*^ is defined here as 2d,2, such that it is the expectation 
value of ^ r] 2 \ . 
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The extension to more dimensions is fairly straightforward, if we replace the 
simple one-dimensional enumeration n — 0, 1, 2, ... by a multidimensional one: 



n = n= (ni,n 2 , . . . ,n a ) 



=0,1,2,3,... 



(46) 




(47) 



and (for a Gaussian measure as above) we again have 



>ortho 



1 



y^Vfj ^2 Uf{(x k )Ufi(xi) 



(48) 



where now the sum runs over all vectors n = (m, . . . , n s ) except the null vector 



In the following we shall discuss some explicit examples of these problem 
classes, and the particular form of their induced discrepancies. At this point, 
however, we may already note that, if the orthonormal set is complete, and if 
<7„ does not go to zero as n — ► oo, then D® 1 ^ 10 will be infinite, since it will have 
contributions proportional to 5(0). In concrete terms, in such a problem class 
the typical integrands show such wild behaviour that the average squared inte- 
gration error is infinite. In our explicit examples we shall see how the behaviour 
of er„ relates to the smoothness properties (or lack thereof) of the functions in 
the problem class. 

4.5 Examples 

We shall now discuss some explicit examples of problem classes, and their in- 
duced quadratic discrepancies. The first two examples are not of the orthonormal- 
function type; the last two are. 

4.5.1 Wiener sheet measure: the Wozniakowski Lemma 

One of the more popular problem class measures is the Wiener measure. It 
is widely used in field theory and statistical physics, and describes, basically, 
a Brownian motion. Although its precise definition, especially in more dimen- 
sions, is subtle, we can describe it in the following manner. Let us start with 
the case s = 1, and 'discretize' the unit interval [0,1) into many small subin- 
tervals (not necessarily equal), to wit [0,a;i), [xi,x 2 ), [^2,^3)1 an d so on. The 
value of the function jump from x p to x p+ i is then assumed to be distributed 
normally around zero, with variance given by the interval size (not its square!). 
In this way, the Wiener measure becomes transitive, in that the distribution for 
f(x p +2), when we jump to x p +2 directly from x p , is the same as that obtained 
when we first jump from x p to x p +i, and then from x p +\ to x p +2- Therefore, 
we may insert new interval endpoints in-between the old ones at will, and so 



(0, . . . , 



0). 
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approach the continuum limit. The derivative of the function will then have the 
Green's function 

(f'(x 1 )f'(x 2 )) f = S(x 1 -x 2 ) . (49) 

This singular behaviour comes from the fact that the average value of the jump 
in / is proportional to the square root of the jump size, whereas for differen- 
tiable functions it would, asymptotically, be proportional to the jump size itself. 
The Wiener measure is, therefore, completely dominated by functions that are 
everywhere continuous but nowhere differentiable. 

By integration, we can simply derive the Green's function for the function 
values themselves: 

( f{xi)f(x 2 ) ) f = min(xi,x 2 ) . (50) 

In more dimensions, we have the Wiener sheet measure, K = I s , the role of the 
derivative f'(x) is played by the multiple derivative: 

f s (x) ee (d s /dx 1 dx 2 ■ ■ ■ dx s )f(x) , (51) 

and we have 

(fs(x 1 )f s (x 2 )) f = n^i-^) . 

n 

(f(xi)f(x 2 )) f = n min «>4) • (52) 
n 

In addition, we have ( f(x) ) = 0, so that the only nonzero connected Green's 
function is 

c 2 (xi,x 2 ) = Y[min(x^,x^) = / dy h(x 1 ;y)h(x 2 ;y) , 
n K 

Hx;y) = \{e(x^-y^) , (53) 
n 

where we have also indicated the factorization. Inspection tells us that, in fact, 
h(x; y) is equal to the function x discussed for the classical discrepancy, only with 
the points x and y reflected: x^ — > 1 — .t^ 1 , y^ — » 1 — y^ . We therefore arrive 
immediately at the so-called Wozniakowski lemma: for integrands distributed 
according to the Wiener sheet measure, the average squared integration error is 
equal to the classical D 2 discrepancy, for the reflected point set. For random 
points, the expected value of the quadratic induced discrepancy is 

D Wiener = }_(}__ }_\ (54) 

as discussed above. The reflection of the point set is in practice not very impor- 
tant since any self-respecting point set is, to a high degree, more or less invariant 
under reflection. 
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Figure 3: A typical integrand under the Wiener measure. 



This result, published in jTl| |, was the first instance of the relation between 
problem classes and discrepancies. It is seen here to arise as a special case of a 
more generally applicable induction of discrepancies. Since the Wiener measure 
as discussed is Gaussian, we have also established that the error will, in fact, be 
normally distributed over the problem class, which fact was not noticed in |p"lf . 

4.5.2 Folded Wiener sheet measure 

As mentioned, integrands that are 'typical' under the Wiener measure may not 
be of the kind studied in particle physics phenomenology: as an illustration, 
we present, in fig. 6, a 'typical' integrand in one dimension. Although they are 
continuous, such integrands have no additional smoothness properties. 

As an approximation to smoother functions, one may consider so-called 
folded Wiener sheet measures, which are obtained by (repeated) integration, so 
that the fractal zig-zag structure becomes hidden in some higher derivative |12j. 
A typical integrand under the n times folded Wiener sheet measure is defined, 
recursively, by 

X\ X2 X s 

f {n \x) = J dx\ [dx' 2 ---[ dx'J^ix') , (55) 



where f^°'(x) is the integrand chosen from the Wiener measure. For these 
folded measures, the induced discrepancy can therefore be obtained trivially by 
repeated integration of h in Eq.(|53|) over the variables x: 

h^(x;y)=l[ [ - yj-0(x»-tf) , (56) 
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Figure 4: A typical integrand under the one time folded Wiener measure. 

and, for random points, the expected quadratic induced discrepancy in this case 
is given by 



This discrepancy, and hence the expected integration error, decreases extremely 
rapidly with increasing degree of folding. This can be understood easily if we 
plot, say, the one times folded equivalent of fig. 6, which is given in fig. 7. 

One sees that, even under only one folding, we obtain functions that are 
extremely smooth, and it is not surprising that such integrands can be treated 
with very small error. We conclude that, although the Wiener measure may 
be applicable in some fields (the behaviour of the stock market, say), it is not 
really a valid model of typical integrands in particle physics phenomenology. 

4.5.3 Fourier measure 

We shall now discuss another problem class measure, using orthonormal func- 
tions as described above: again, first in one dimension. Let us take 

uq{x) = 1 , U2n{x) = V2cos2nnx , U2 n -i(%) = V / 2sin27rnx . (58) 

These functions are even a complete set: if we disregard the Gibb's phenomenon, 
which we may for purposes of integration, every practically relevant function can 
be approximated arbitrarily closely by a combination of M n 's. In appendix B 
we present a short proof for completeness. Also, in many physical cases the 
Gaussian distribution of the magnitude of the several higher modes is quite 
reasonable. 



ifolded Wiener 




(57) 
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We may immediately write down the form of the induced quadratic discrep- 
ancy, as given in Eq.(^). At this point, we may make another assumption. 
The two functions U2n and U2 n -\ describe the mode with frequency n, with two 
phases apart by ir/2. These may be combined into a single mode, with a single 
phase xq: 

«2n-itt2n-l(a0 + v 2 n U2n(x) = \f2v 2n -i sin 2-nnx + V2v 2n cos2imx 

cx cos 2im{x -I- xq) . (59) 

Now, if we, reasonably, assume that xq is uniformly distributed between — 7r/2 
and 7r/2, we can infer that this necessitates 

@2n—l — ®2n , n = 1, 2, . . . , (60) 

and in the following we shall use this simplification. The discrepancy then takes 
the form 

^Fourier = _L £^ £ ^3^(3* - 

n>0 k,l 

Note that we have two equivalent expressions here, that are however of different 
consequence in computer implementation. The first line contains a double sum 
over the point set, and it can be evaluated in time 0(N 2 ), if we know a closed 
expression for ^ a\ n cos(27rna;). For instance, we may assume <72n = 1/", in 
which case we have 

^2 cos(27rnx) = y ^1 - 6x(l - x)\ . (62) 

n>l ^ ' 

On the other hand, we may use a single sum as in the second line of Eq.(|6l|); 
but then, we have to sum over the modes explicitly. Supposing that we only 
really need the modes up to n = m, the discrepancy can then be computed in 
time 0(mN). 

The multidimensional extension is again rather trivial, if we assume the 
orthonormal functions as a direct product of the one-dimensional ones, as dis- 
cussed above. We may immediately write out the induced quadratic discrepancy. 
With a little algebra, we may, in fact, arrive at 

£ 2 F ° Urier = ^X>|5>xp(M.^)| 2 > 

n k 

T(ni,n 2 ,...,n=) = a (2\n 1 1 ,2|n 2 1 , . . .,2|n« | ) > ( 63 ) 

where the sum over n now runs over all integer values (both negative and pos- 
itive) of the components, excluding the null vector (0, 0, . . . , 0). 
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Figure 5: A typical integrand under the Fourier measure, with <72n-i — °2n = 
1/n. 



We finish this section by pointing out a relation between our results for 
the induced discrepancy and the exponential sums discussed at length in ^ Q 
(for instance, see ||, chapter 2, section 2). There, the classical Kolmogorov 
discrepancy is related to exactly the same kind of exponential, the only difference 
being the absence of the factor What does this imply? Note that, in one 
dimension (for simplicity) we have 




so that the convergence of these sums informs us about the smoothness of the 
typical integrands: the smoother, on average, the integrand, the faster the a n 
has to approach zero. If a n — 1, as implicit in the treatment as given in Q, 
the average integrand is not even quadratically integrable! An example of such 
an integrand is given in fig. 8. From the point of view of the present paper, 
choosing a slowly decreasing appears to lead to a certain kind of overkill in 
the selection of good point sets. 
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Figure 6: The first nine Walsh functions, W {x) to W$(x). 



4.5.4 Walsh measure 

Now we shall discuss yet another complete set of orthonormal functions, the 
so-called Walsh functions. Again, we start with s = 1. We first introduce 
the Rademacher functions <f> n {x): these are periodic with period one, and, for 
< x < 1, 

^i(^) = | _\ if J S 1/2 ' 0n(s) = n -i(2a;) ■ (65) 

The Walsh functions W n (x) are now given as follows: let the binary decompo- 
sition of n be 

n = m + 2n 2 + 2 2 n 3 + 2 3 n A H , (66) 

then W n {x) is 

W n (x) = Mx) ni Mx) n2 Mx) n3 ■■■ , (67) 

and we define Wo (a;) = 1. As an example, we present the first few Walsh 
functions in fig. 9. The Walsh functions form a complete set, and we may use 
them to construct a problem class, defined under the Walsh measure. In fig. 10, 
we show a typical integrand under the Walsh measure, with a n — l/n. 
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Figure 7: A typical integrand under the Walsh measure, using W n with n = 
0,1,.. . ,200, and a n = l/n. 



Some insight in the computational character of the Walsh functions can be 
gained as follows. For any x in the unit interval [0, 1), let us define its binary 
notation by 

x = x^- 1 + x 2 2~ 2 + x 3 2~ 3 + ■ ■ ■ ; (68) 
it is, then, easy to see that 

<f> n (x) = (-1)*" , (69) 

and therefore, the Walsh functions just measure certain parities: with n decom- 
posed as above, we have 

W n (x) = (_l)»i*i+»2* a +n s x s + - . (70) 

Clearly, in a language with command of bitwise operations, such an evaluation 
can easily be implemented. Note, moreover, that as long as we restrict the range 
of n, our assumption of the real-number model of computation is justified: if 
the least significant bit of x has a value of 2 _p , then the finite word length of 
our computer is irrelevant as long as n < 2 P . 

The extension to the case of more dimensions is of course straightforward, 
but we shall not discuss this further. 



5 Discrepancies for random point sets 

5.1 Introduction: the need for comparison 

So far, we have studied how the choice of integration problem classes relates to 
discrepancy- like properties of the integration point set. Obviously, we should like 
to improve on classical Monte Carlo as much as we can. This raises the question 
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we shall study in the rest of this paper: suppose we have a point set (or sequence) 
at our disposal, which we want to use for integration. Suppose, moreover, that 
we have computed its extreme, or another induced, discrepancy. To what extent 
is this discrepancy better than what we could expect for random points? To 
answer this question, we have to compute, for a given value of the discrepancy, 
the probability that purely random points would give this value (or less) for the 
discrepancy. That means, we have to know the probability distribution of the 
discrepancy, which is now viewed as a random quantity through its dependence 
on the set of random points. Of course, we could restrict ourselves to the 
relatively simple question of the average discrepancy for random points, but 
it is much better to know the confidence levels, and, moreover, we shall see 
that they can actually be computed in many cases. Another point is that we 
might use a source of (pseudo)random numbers to produce a histogram of the 
discrepancy for a number of generated point sets — but in practice this is not 
feasible, since we would, before generating such a histogram, have to ascertain 
that the produced sequence is actually acceptably random: and how to do this 
without comparing to the theoretical expectations? 

5.2 Moment-generating function 

We shall study the moment-generating function of the linear and quadratic 
induced discrepancies, that is: if for a set of N truly random points, the linear 
discrepancy equals x, we then try to find the form of 

G 1 (z) = (exp(zx)) , (71) 
and then, we can (numerically) find the probability density for x: 

oo 
— oo 

and, similarly, we can find G 2 and fa for the quadratic discrepancy. Although 
the linear discrepancy is not as important as the quadratic one, we shall discuss 
it first since it indicates how the limit of large N can simply be taken. 



5.2.1 Linear Discrepancy 

Let us assume that, for a given problem class, the discrepancy function h(x; y) 
is known. The linear discrepancy is then given by 

Di = J dy ^ u k {y) , uj k (y) = h(x k ;y) - J dn(x) h(x;y) . (73) 

L k K 

To get the moment-generating function, we have to evaluate 
( D i ) = J d Vi J dy 2 --- J dy p / ^ ui kl {yi)u) k2 (y 2 ) ■ ■ ■ cj kp {y p ) \ 



fel,2,...,p 



(74) 
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Figure 8: Some points generated in two dimensions using a good pseudorandom 
generator (RANLUX, level 4 §). 
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for all p > 0. Let us consider the sum over the indices k in detail. All indices run 
from 1 to N, and some of the indices may be equal to others. Those contributions 
where the indices fall into m distinct values (that is, some of the ki all have one 
value, some other all have another value, a third group has another value, and 
so on), enter with a combinatorial factor N(N — l)(iV — 2) • • • (N — m + 1). It 
is clear that the largest combinatorial factor is obtained if as many as possible 
of the indices k are different. Now, we have 

(w k (y)} = , (75) 

and the largest nonzero contribution is therefore the one coming from the cases 
where the indices come in pairs. This also means that ( D\ ) vanishes (to leading 
order in N) if m is odd, and 

< D i m ) - j^ Nmi ^ [ J < Myi)Mv2) > ) , (76) 

where we have also made the approximation 

N(N-l)(N-2)---(N-m + l)~ N m , (77) 

which is justified when N 3> m 2 . The moment-generating function can now be 
written as 

,2m / r 2 

( e zDl ) = Y ^— ( D\ m ) = exp ( ±-W" 
x 1 ^ (2m)! x 1 1 \2N 

m>0 y ' 

W = dfi(x) I dyidy 2 h(x;y 1 )h(x;y 2 ) 



= J dfi{x) J 

K L 



dfi{x) J dyh{x;y) , (78) 



\K L 

and the probability density can be obtained by Fourier transform. In the large- 
N limit, the value of D\ has a Gaussian distribution with mean zero and width 
y/WjN. Of course, all this is nothing but the Central Limit Theorem, but we 
shall use the above combinatorial arguments also in our study of the quadratic 
discrepancy. 

The value of W depends, of course, on the problem class. The Wiener 
measure discussed above leads, in one dimension, to 



W 



= J dxdyidy 2 9(yi - x)9(y 2 - x) - | J dxdy 9(y - x) 
o \o 

= J dyxdy 2 min(yi, y 2 ) - I J dy y J 
o \o / 



1 1 

3 ~ 4 



(79) 
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and in higher dimension, we obviously have 

-G)'- ay ^ 

In the case of problem classes defined using orthonormal functions as above, we 
notice that 

/ dy h(x;y) = ^2<j n u n (x) dy u n (y) = . (81) 

L ">° L 

and therefore Di is always strictly zero for such problem classes. 
5.2.2 Quadratic Discrepancy 

We now turn to the somewhat more complicated case of the quadratic induced 
discrepancy. In order to avoid factors of N cropping up everywhere, we shall 
consider ND 2 rather than D 2 itself. We have 

D2 = jp I d y S w fc(f) w '(i/) • (8 2 ) 

Its m th moment is given by 

(N m D?) = ^Jdyi fdy 2 ---fdy m £ £ 

L L L fcl,2,...,mil,2,...,m 

( wfc! (yi)^h (yi)uk 2 (2/2 M 2 (y 2 ) ■ ■ ■ u km {y m )ui m (y m ) ) (83) 

By the same combinatorial argument as before, in the large- N limit we only 
have to consider the contribution to the sum that consists only of pairs of w's: 

a(yi,y 2 ) = ( Wfe(j/i)wfc(j/2) ) = P(yi,y2)-v(y 1 )v(y 2 ) , 
/%i,2/2) = j dxh{x;yx)h{x]y 2 ) , 

K 

v(y) = [ dx h(x; y) . (84) 



K 



At this point, a few remarks are in order. In the first place, since every parameter 
yi occurs twice, the various pairs cannot be considered independently: rather, 
we have to deal with objects of the form 

z n = J d yi dy 2 ...dyMy,y2)«(y2,ys)---«(yn-uy n )«(y n ,y^ ■ (85) 

L 

Another consideration is that of factorization in more dimensions. Whereas, 
in higher dimensions, the functions h(x;y) themselves usually factorize into a 



29 



product of one-dimensional factors (as for the Wiener and orthonormal problem 
classes), this no longer holds for the functions a: only the functions (i and v 
factorize. We therefore decide to build up ( ) from two distinct objects: 

C n = J dyidy 2 ---dy n P(y 1 ,y 2 )(3(y 2 ,y 3 )---(3(y n - 1 ,y n )p(y n ,y 1 ) , 

L 

O n = dyidy2---dy n v(y 1 )f3(yi,y 2 )---(3(yn-i,yn)v(yn) , (86) 



which we call closed strings and open strings, respectively. We can then con- 
centrate on the one-dimensional case first, and in more dimensions simply use 

C„(s-dim) = C„(l-dim) s , 

(9„(s-dim) = O n (l-dim) s . (87) 
Keeping careful account of the combinatorial factors, we have 

/ N m D m = V V m! Ml 

Pl,2,... 91,2,... 



n(V c n(-^ io «) c 

;>i ^ ' n>l 



(88) 



with the constraints 



m = p\ + 2p 2 + 3p3 H h qi + 2q 2 + 3q 3 H , 

r = 91 + 92 + 93 + --- • (89) 



The moment-generating function is, therefore, 

G 2 {z) = E^(^™)-^7# > 

«*) = , 

n>l 

*(*) = l + ]T(2z)"0„ . (90) 

n>l 

So, provided we can compute the closed and open strings for a given problem 
class, we can compute the probability density f 2 (x) for ND 2 to have the value 
x, by Fourier transform: 

oo oo 

f 2 {x) = -L J dz e- lzx G 2 {iz) = I y dz Re ( e -* za: G 2 (iz)) , (91) 
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where we have used G 2 {z*) = G 2 {z)* , since G 2 (z) is real for z real and suffi- 
ciently small. Finally, by taking various derivatives of G2(z) at z = 0, we can 
compute the moments of the ND 2 distribution, and find 

mean = C\ — 0\ , 

variance = 2{C 2 - 20 2 + Of) , 

/- C 3 - 30 3 + 30 2 0i - O? 
skewness = V8 ^ - — - q2)3/2 , 

C 4 - 40 4 + UhOi + 20l_40 2 Ol + O 4 
(C 2 - 20 2 + 0\) 



kurtosis = 12 ^ - ^ 4 1 1 • (92) 



5.3 The Wiener case 

We shall now concentrate on the Wiener problem class. As usual we shall start 
with s = 1. In this case, we have 

P(yi,V2) = min(yi,y 2 ) , v(y) = y . (93) 

The strings can be computed most easily by first defining 
1 

A n (x, V) = j d V2 dy 3 --- dy n (3(x, y 2 )/%2, 2/3) ■ ■ ■ (3(y n -i,y n )P(y n , y) , (94) 



so that 



1 

C n = j dxA n (x,x) , O n =A n+1 (l,l) , (95) 


where the second line holds only in this, the Wiener, case. The objects A n obey 
a recursion relation: 

M{x,y) = (3(x,y) , 
1 

A n (x,y) = J dz4„_i(i,z)i3(2,!/) , n = 2,3,... . (96) 


They can most easily be computed using a generating function that itself satisfies 
the obvious integral equation: 

1 

F(t;x,y) = J2Mx,y)t n , =t f dzF(t;x,z)0(z,y)+t/3(x,y) , (97) 

n>l { 

from which we shall establish a differential equation. Since 

d d 2 
min(x,0) = , — mm(x, y) = 6(x-y) , min(x, y) = -5(x-y) , (98) 
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we can conclude that 
dy 

with the boundary conditions 



2 F(t;x,y) = -tF(t;x,y) -t8(x-y) , 



(99) 



F(t;x,0) = 



^-F(t;x,y) 
dy 



y=o 



t / dz F(t;x,z) +t 



(100) 



We can solve Eq.(|99j) by standard methods, and, with t — u 2 , we have 

u 

F(t; x, y) = — u9(y — x) sm{uy — ux) H cos(u — ux) sin(uy) , (101) 

cos it 



which is actually symmetric in x and y, as it should be. From Eq.(lOl) we can 
compute C n and O n in a straightforward manner, since 



n>l 

and we arrive at 

O n -i 



± 

Y, C nt n = f dxF(t;x,x) , ^O n t n = jF{t;l,l) 



(102) 



TV 

2C„ , 



f(2n) 



(2 2 " - 1) 



(2n)! 



So 



(103) 



where C(2n) is the Riemann zeta function, and Bm is the Bernoulli number. 
The first few instances are 



c 1 r - 17 

C3 ~15 ' C4 "630 
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(104) 



Incidentally, we also get Oo = 1, in accordance with the fact that the series 
expansion of x( z ) must start with 1. The functions ip and x obtain the form 



n>l 



8z 



*(*) - E 



7r 2 (2n- l) 2 
1 



n>l 



7r 2 (2n- l) 2 - 8z 



'22 



1 



= — — log cos v 2z , 



tan v 2z 



(105) 



We can, using the series expansion forms, compute ip(iz) and x(iz) to arbi- 
trary accuracy. We might also use the fact that, in one dimension, 



G 2 {z) 



V2z 
sin \rlz 



1/2 



(106) 
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Figure 9: Real and imaginary part of G2(iz) for s = 1. 



but, what with all the branch cuts this function has in the complex z plane, it 
is not easy to extend the computation of this more compact form so that G<}[z) 
is adequately defined along the whole imaginary axis; and we shall use the form 
(105). Note that we may 'truncate' the series at large enough n, and sum the 
remainder in the approximation z <C n 2 : this gives, for instance 



i)(z) 



1 1 2 

-z + -z 2 - 

2 6 2 



n=l 

1 K 

]_ \ ' z _n_ 

n 

(2z 



n=l 

2 i K ^ 
l + -z- 

3 Z * — ' 1 — Z n 

n—1 



log(l - Z n ) + Z n + \^Z 2 n 



(107) 



7T 2 J (2n- l) 2 ' 

where if is chosen large enough. In fig. 11 we present the generating function 
G-i{iz) for positive z. 

The Fourier transform now gives us f2{ND2) for s = 1. Its shape is given 
in fig. 12, for D 2 > 0. We have checked that /^(s) = for negative x. The 
shape of the density is seen to be skewed, and not to approximate a Gaussian 
in the large- N limit (in fact, there is no Central Limit Theorem telling us that 
it should): the maximum is somewhat lower than the mean. It can be checked 
numerically that f2(x) is normalized to 1, and the first two moments are 1/6 
and 1/20, respectively, in agreement with Eq.(|9^). 

We now turn to the case of s dim ensio ns. We have to replace C n — * C®, 
O n — * O n , and the nice analytic form ( |106| ) of the moment-generating function 
is lost. The series form for ip and x can be written in a form similar to that of 
Eq.(105), if we introduce P s (n), the factor multiplicity in s dimensions. This 
number counts in how many ways an odd integer n can be written as a product 
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Figure 10: Probability density /^(./VZ^), for s = 1. 



of s odd integers, including factors 1. In appendix D we show how this can 
easily be computed. The functions tp and \ then take the form 



21 V6/ ' 2 

K 



log(l - z n ) + z n + -z 2 n 



=» s (^)'pSf • (108) 

We can now compute the f2(x) in these more-dimensional cases. In doing so, we 
must realize that G2(z) becomes increasingly flat, and hence f2(x) increasingly 
more peaked, with increasing s, as is evident from table 1, where we give the 
mean, standard deviation a, and skewness of f2(x) as a function of s. Note that 
the deviation from a Gaussian, as indicated by the skewness, decreases with 
increasing s, but does so very slowly: the skewness decreases by a factor of 10 
for an increase in s by about 113, and attains the value 0.001 only for s = 389. 
For 'medium' dimensionality, such as is common in high-energy phenomenology, 
we cannot trust the Gaussian approximation, and the full numerical Fourier 
transform has to be used. The results are best presented in the form of the 
standardized variable 

Z= X —^1 . (109) 
a 

In fig. 13 we plot cr/ 2 (a;) as a function of £ for various dimensions. For compari- 
son, we also give the shape of the standard normal distribution under the same 
plotting convention. 

For comparison purposes, the confidence levels implied by /2(x) for a given 
dimensionality are relevant, as we have discussed. We therefore give, in table 
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Figure 11: Probability density <rfi{{ x ) + <r£) for the Wiener problem set, as a 
function of the standardized variable £, for dimensions from 1 to 128. The last 
plot is the normal distribution with the same convention. 
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3 


0.880E-01 


0.502E-01 


0.236E+01 


4 


0.502E-01 


0.242E-01 


0.234E+01 


6 


0.143E-01 


0.491E-02 


0.231E+01 


8 


0.375E-02 


0.915E-03 


0.226E+01 


10 


0.960E-03 


0.163E-03 


0.220E+01 


12 


0.242E-03 


0.283E-04 


0.214E+01 


16 


0.152E-04 


0.819E-06 


0.200E+01 


20 


0.953E-06 


0.231E-07 


0.186E+01 


24 


0.596E-07 


0.647E-09 


0.172E+01 


32 


0.233E-09 


0.501E-12 


0.147E+01 


40 


0.909E-12 


0.387E-15 


0.125E+01 



Table 1 : Parameters of f 2 (x) as a function of dimension. 



2, a number of quantiles for a number of different dimensions. Note that the 
50% quantilc corresponds to the mode of the distribution, which (due to the 
skewness) is not necessarily equal to the mean value, which for the standardized 
variable £ is of course zero. To have an impression of the numerical accuracy 
of the quantile determination, we also give the difference between 1 and the 
actual numerical value of the total integral, which ideally should be zero. The 
behaviour of the quantiles with dimension shows that the case s = 1 is not really 
typical, since the distribution for s = 2 appears to be much wider. This can 
be understood somewhat if we realize that f 2 {x) vanishes for negative x: if a is 
comparable to ( x ) , this forces the low quantiles to be relatively close to zero. 

5.4 Orthonormal case 

We now turn to the case of a problem set defined in terms of orthonormal 
functions, such as the Fourier or Walsh set. In these cases, there is no problem 
with factorization in higher dimensions. For s = 1, we have 

a(#i> 2/2) = ^2 o , lu n (y 1 )u n (y 2 ) ; (110) 

n>l 

from which it is trivial to prove that 

Z m = J2°n m ■ (HI) 

n>l 

Now, we may use C n — Z n , and O n = 0, to construct the moment-generating 
function: 

G 2 (z) = exp(i>(z)) , 
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dimensionality 










1 


2 


4 


8 


16 


32 


64 


oo 


0.1 % 


-1.00 


-1.15 


-1.27 


-1.39 


-1.66 


-2.06 


-2.53 


-3.09 


1 % 


-0.95 


-1.06 


-1.14 


-1.23 


-1.40 


-1.67 


-1.97 


-2.33 


5 % 


-0.87 


-0.94 


-0.98 


-1.03 


-1.12 


-1.28 


-1.46 


-1.64 


10 % 


-0.81 


-0.86 


-0.88 


-0.90 


-0.96 


-1.06 


-1.17 


-1.28 


50 % 


-0.32 


-0.29 


-0.28 


-0.26 


-0.22 


-0.16 


-0.09 


0.00 


90 % 


1.21 


1.22 


1.21 


1.21 


1.21 


1.22 


1.25 


1.28 


95 % 


1.98 


1.96 


1.95 


1.94 


1.90 


1.83 


1.74 


1.64 


99 % 


3.87 


3.80 


3.78 


3.74 


3.63 


3.36 


2.93 


2.33 


99.9 % 


6.72 


6.56 


6.54 


6.43 


6.25 


5.70 


4.75 


3.09 




7.6E-5 


6.3E-5 


6.0E-5 


5.1E-5 


4.1E-5 


1.7E-5 


2E-6 






Tabic 2: Quantilcs of the standardized distribution of the quadratic discrepancy 
for the Wiener problem set, for various dimensions. The infinite dimensionality 
corresponds to the normal distribution. The last line gives the accuracy of the 
total integral. 



*!>(*) = log (1-2*^) 

n>l 



n>l rn>l m > 1 

(112) 

where we have also indicated the proper way to evaluate it. In the more- 
dimensional case, we might assume that 

On = <J( ni ,n 2 ,...,n e ) = O ni (J n2 1 ' ' <?n e , (H3) 

and obtain an analogous formula. An attractive choice would be to have 

o n = — - — , (114) 
2n-l ' v ; 

so that we may use the factor multiplicity P s {n) again: 

G 2 (z) = exp(i>(z)) , 
${z) = -^P s (n) log {l-2za 2 n ) 

n>l 



\ J2 Ps(n) [log (1 - 2zal) + 2za 2 n + 2z 2 ^] + z^(2) s + z 2 £(4) s , 

n>l 

(115) 
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6 Discrepancy Calculations 



6.1 Extreme Discrepancy 

Numerical calculation of the extreme discrepancy of a particular point set re- 
quires first calculating the local discrepancy at a position in the hypercubc, then 
finding the maximum of that local discrepancy over the hypercube. The local 
discrepancy is calculated in time Ns, where N is the number of points and s the 
dimensionality, since it involves essentially counting the number of points all of 
whose coordinates are less than the position being considered. For high dimen- 
sionality some time can be saved because as soon as one coordinate of a point is 
greater than the coordinate of the position, it is not necessary to test the others, 
so the actual time would behave more like Ny/s. The major problem is of course 
finding the maximum of this function. The local discrepancy is discontinuous 
not only at every point, but also at every position of which any coordinate is the 
coordinate of a point. This means there are N B candidate positions where the 
maximum could potentially occur. Again a clever program can save some time 
because some of these positions are not in fact possible maxima, but still the 
overall computational complexity can be expected to behave approximately as 
Ny/sN s , which makes it prohibitive to calculate for large point sets in high di- 
mensions. Practical limits are around ten points in ten dimensions or 100 points 
in five dimensions, which are too small to be of interest for real calculations. 

6.2 Exact Quadratic Discrepancy 

Although it may appear that quadratic discrepancy is more complicated than 
extreme discrepancy because it is defined as an integral over the hypercubc, in 
fact the integrals involved are only step functions and it is possible to perform 
considerable simplification to reduce the calculation time. Perhaps the best way 
to explain how the calculation works is simply to give the Fortran code: 

subroutine discd2(ntot,ns,x,d2) 

* compute the quadratic discrepancy for the point set 

* x(i,mu), i=l , 2 , . . . ,ntot mu=l,2, . . . ,ns 

* the output is the array d2(n), n=l,2, . . . ,ntot 

* and gives the discrepancy * n~2 for the first n points 

implicit real*8 (a-h, o-z) 
dimension x(ntot,ns) ,d2(ntot) 

* initialize a few constants 

c2 = (Id0/2d0)**ns 
c3 = (Id0/3d0)**ns 
bn = 

* start loop over number of points 

do 999 n = l,ntot 

if (mod (n, 100) .eq.O) print *, 'doing' ,n, '... ' 
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* compute b(n) and a(n,n) for the new point 

a = l.dO 
b = l.dO 
do 1 mu = 1 , ns 

a = a*(l.dO-x(n,mu)) 

b = b*(l.d0-x(n,mu)**2) 

1 continue 
b = c2*b 

* update running value of sum_b 

bn = bn+b 

* case n=l 

if(n.eq.l) then 

d2(n) = a - 2*bn + c3 

* case n>l 

else 

* sum the a(i,n) for i=l to n-1 

an = O.dO 

do 3 i = l,n-l 

temp = 1 . dO 

do 2 mu = 1 , ns 

temp = temp*(l.dO-dmaxl(x(i,mu) ,x(n,mu))) 

2 continue 

an = an+temp 

3 continue 

* give d2(n) for n>l by relating it to d2(n-l) 

d2(n) = d2(n-l) + 2*an + a - 2*bn -2*(n-l)*b + (2*n-l)*c3 
endif 

* end of loop over n 
999 continue 

end 

Note that we used double precision to calculate the discrepancy, even though 
we generated the points in single precision. This is necessary since considerable 
precision is lost in the big sums, and we found that in single precision only about 
one or two digits of accuracy remained for the larger point sets. 

6.3 Quadratic discrepancy by Monte Carlo 

Since the quadratic discrepancy is a multidimensional integral, it can be es- 
timated using a Monte Carlo approximation. If the desired accuracy is, for 
example 5% , we found Monte Carlo to be faster than the exact calculation 
for more than about 50,000 points. We used ordinary pseudorandom numbers 
for these calculations for fear that correlations between the two quasi-random 
point sets would produce unreliable results. We repeated some Monte Carlo 
calculations using the exact algorithm to verify that the agreement was good. 
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van der Corput 3d 




25000 50000 75000 100000 25000 50000 75000 100000 

(a) van der Corput, s = 3, 4 and 5 (b) RanLux, s = 4 

Figure 12: The quadratic discrepancy for up to 100 000 points for (a) the van 
der Corput generator in 3, 4 and 5 dimensions; and (b) four samples of random 
points in 4 dimensions. 

6.4 Numerical Results 

We have calculated the quadratic discrepancy for four different quasi-random 
generators: Richtmyer, van der Corput-Halton, Sobol' and Niederreiter-base- 
two, all from one to twenty dimensions. Exact calculations were made for 1- 
100,000 points, and Monte Carlo calculations for 100,000 and 150,000 point 
sets. 

The programs used to generate the Richtmyer points and van der Corput 
points were written by us, and work in any number of dimensions limited only 
by some dimension statements and the table of prime numbers. The Sobol' and 
Niederreiter programs were kindly provided by Paul Bratley and Bcnnet Fox. 
Their Sobol' program is implemented for up to 40 dimensions, and we have 
extended their Niederreiter-base-two programs to work up to 25 dimensions. 

We have consumed weeks of computer time producing dozens of plots of 
quadratic discrepancy as a function of N for the four quasi-random generators 
and also some good pseudorandom generators, in all dimensions from one to 
20. It would be fastidious to show all these plots here, especially since many of 
them are rather similar and most of the behaviour can be summarized in some 
more revealing plots as we later discovered. However, we do show here a few 
typical plots as well as those which we found the most surprising. 

Figure [jj] shows some examples of quadratic discrepancy calculated for ran- 
dom and quasi-random point-sets up to 100 000 points. In these plots, the error 
bars give the total range of quadratic discrepancies calculated for all the 1000 
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Figure 13: The quadratic discrepancy for up to 100 000 points for both random 
and quasi-random points in 8 dimensions. Three different random point sam- 
ples are compared with our four quasi-random generators which all have much 
smaller discrepancy. 

values of N in the range around the plotted value. Thus the tops and bottoms 
of the error bars give the envelope of the full plot, had we plotted a point for 
each value of TV. Subfigure (a) shows a typical quasi-random behaviour for low 
dimensionality, in this case the van der Corput generator for 3, 4 and 5 dimen- 
sions. One sees that apart from very small point-sets, there is a considerable 
improvement over the expectation for random points (defined here as =1.0), 
and this improvement increases (faster convergence) with increasing N, as ex- 
pected from the theoretical results. However, with increasing dimension, the 
improvement over random points decreases. For comparison, figure |l2|(b) shows 
the same quadratic discrepancy for four typical random point sets. Note the 
different scale, however, as these discrepancies are much larger than those of 
quasi-random points. 

Figure [l3] shows the discrepancy in eight dimensions for both random and 
quasi-random points. Our normalized values of quadratic discrepancy for three 
random sequences are of course close to one as they must be. All the four 
quasi-random sequences show a much lower discrepancy, between a factor five 
and ten lower for 50 to 100 000 points. For more than 40 000 points, the Richt- 
myer generator is not quite as good as the other three quasi-random generators. 
Note that these results are obtained for dimension 8. The results for other low 
dimensionalities are qualitatively similar. 
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Figure 14: The quadratic discrepancy for up to 100 000 points for the Sobol' 
sequence in dimensions 9 to 17. 



It must be realized, however, that for higher dimensionalities surprises may 
be in store. As an illustration, we give in figure [l4| the discrepancy for the 
Sobol' sequence in dimensions 9 to 17, as a function of the number of points. It 
is seen that around 70,000 points the discrepancy it not small at all, and is ac- 
tually worse than random for s = 11; after this bump, however, the discrepancy 
quickly returns to a very small value. Although such a behaviour can presum- 
ably be well understood from the underlying algorithm, it must be taken as a 
warning that for quasi-random sequences, an increase in the number of points 
does not automatically lead to an improvement. Note that this behaviour is 
not necessarily the same as the well-known 'irregularity of distribution', since 
in figure [li] it is actually the envelope of the discrepancy curve that shows a 
global, long-term rise (for s — 11, running from N ~ 30,000 to N ~ 70,000, a 
much larger scale than ought to be expected for local increases in discrepancy) . 

Figure [l5| shows a summary of the quadratic discrepancies for all our quasi- 
random generators. As usual, all discrepancy values are normalized to the 
expectation for a truly random point set. Since we found the behaviour with 
respect to N rather uninteresting for most cases, we have instead plotted here 
the discrepancy for a few fixed values of N as a function of the dimensionality 
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Figure 15: The quadratic discrepancy, normalized to the quadratic discrepancy 
for truly random numbers for N=50 000, 100 000 and 150 000 points as a 
function of dimensionality for four different quasi-random generators. 
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s. This is considerably more revealing, and in fact the first striking aspect is 
the resemblance between the behaviours of all the different generators. 

Now we see that the dominant effect is that all generators are good for small 
s and all discrepancies rise sharply as a function of s, finally approaching the 
random expectation around s = 15 for N — 150 000. 

The van der Corput generator becomes much worse than random above 
s = 15, whereas the other discrepancies remain rather close to random up to 
s = 20. Surprisingly, the only one that stays better than random up to s = 20 
is the Richtmyer. 

In Figure |l6| we plot the discrepancies in terms of the variable £ of Eq. ( |l09| ) . 
This representation ought to inform us about how 'special' the quasi-random 
sequences are compared to truly random ones, since on this scale 95% of all 
truly random sequences will fall between -2 and +2. Now we see that the 
lower discrepancy of the Richtmyer sequence compared to random sequences 
is really significant up to 20 dimensions, provided one uses more than 100 000 
points. The poor behaviour of the Van der Corput sequences above s = 15 is 
highly significant, but the other generators look much like random ones between 
s — 18 and s = 20. This provides additional motivation to study the Richtmyer 
generator, which is easily implementable, and in particular to look for optimal 
constants S^. 

It should be noted, that the discrepancy improves in general with increasing 
numbers of points. It might be conjectured, that asymptotic behaviour of se- 
quences' discrepancy will, for s > 10, only become evident for a larger number 
of points. 

7 Conclusions 

We have reviewed a number of arguments that suggest that the use of quasi- 
random numbers in multi-dimensional integration may lead to an improved 
error over classical Monte Carlo. The central notion in this context is that of 
discrepancy. We have shown, that there exist a relation between the definition 
of a discrepancy and the class of integrands of which our particular integrand is 
supposed to be a typical member. We have discussed several such classes and 
derived the appropriate induced discrepancies. 

Another important aspect of discrepancy is the fact, that for a quasi-random 
point set it ought to be smaller than for a truly random point set. We have 
therefore studied the distribution of the value of discrepancy for truly random 
points. 

Finally, we have computed the quadratic Wiener discrepancy for a number 
of quasi-random point sets. For all the quasi-random generators tested, the dis- 
crepancy, normalized to that of truly random numbers increases with increasing 
dimensionality and approaches that of truly random point sets around s = 15. 
If the quadratic discrepancy is plotted in terms of the standardized variable 
£, it is seen that the quadratic discrepancy for the quasi-random point sets is 
indeed significantly better than that of a typical member of the ensemble of 
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Figure 16: The standardized quadratic discrepancy £ for N=50 000, 100 000 a 
nd 150 000 points as a function of dimensionality for four different quasi-random 
generators. For N = 150 000, the Monte Carlo error-estimate is given. 



45 



random point sets for dimensions up to s w 12 for TV = 100 000 to s ~ 15 for 
N = 150 000. Only one quasi-random generator (the Richtmyer) remains signif- 
icantly better than random up to the highest dimension we calculated (twenty) . 
For higher dimensions, it is conjectured that the asymptotic regime for the dis- 
crepancy is approached only for a much higher number of points N, which we 
did not cover in our computations. 
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Appendix A: Van der Corput discrepancy 




Here we present some considerations on the behaviour of the van der Corput 
sequence in base 2, that can be obtained in a straightforward manner. We 
consider the extreme discrepancy for the finite iV-member point set Xf. = (f>2(k), 
with k = 0, 1, 2, . . . , iV — 1, and we define 

d(N)=ND 00 (N) . (116) 

We trivially have that d(N) > 1; more precisely, by inspection of the behaviour 
(given in fig. 1 ) we observe the following recursion: 



if N is even, 

(117) 

§ (l + d(^±) + d(^±I)) if N is odd. 

This recursion sheds some light on how the discrepancy depends on the binary 
properties of N, and in this manner we can compute d(N) very quickly even for 
extremely large N. Now, we find the maximal discrepancy in the following way: 
in each interval between N = 2 P ~ 1 and N = 2 P , we search for the maximum of 
d(N). This is reached for two values of N, and we concentrate on the smallest 
one of these, which we denote by v p : the corresponding value d(N) we denote 
by dp. Again by inspection, we find the following recursions: 

u p = 2^ p _i + (-If , dp = - {dp-! + dp-2 + 1) , (H8) 

with starting values v\ = 1, d\ = 1, and di = 3/2. These relations are easily 
solved to give closed formulae for v v and d p : 

u p = l(2 N+1 + (-ir) , d p -i(3iV + 7-(-2) 1 - A ') . (119) 

Now, as ./V becomes large, we can easily eliminate p, and we find the asymptotic 
behaviour 

d (120) 
dp log8 ' ( Uj 

whi ch ag rees nicely with Eq.(^6|). For other bases, recursion relations similar to 
Eq.(117) ought to exist, but we have not pursued this. 



Appendix B: completeness of the Fourier set 

We present a short and simple proof of the completeness of the orthonormal 
functions u n (x) introduced in Eq.([58|). To this end, consider the sum 

2JV 
n=0 
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N 



1 + 2 [cos 2imx cos 2wny + sin 2nnx sin 2Trny] 



71=1 



= 1 + 2 cos 2-kti{x — y) 

71=1 

= A N (x-y) . (121) 
We consider the limit 

A(0 = lim Ajy(0 . (122) 

TV— >oo 

Obviously, this diverges for £ = 0, and it remains to prove that we get zero if £ 
is not zero. But we have, for such £, 



A(£) = 1 + Reexp(2i7r£r 

71>0 

= l + 2Re^(e 2i7r «)" 



7t>0 

= 1 + 2Re T_^ > ( 123 ) 

which is indeed zero. Therefore, we have 

A(0 = «5(0 , (124) 

the Dirac delta distribution. Note that we have to assume that the sum over 
n in Eq.( p_21[ ) has to have an upper limit 27V rather than N: the way in which 
the limit is approached is important. Using the property that for a continuous 
function f(x), 



fix) = J dySix-y) f(y) 

^u n {x) I dyu n {y)f{y) , (125) 



n>0 



we immediately get the decomposition of fix) into the orthonormal base. 

Appendix C: completeness of the Walsh set 

We also prove that the set of all Walsh functions is complete. Let us first note 
that 

W n (x)W n (y) = W„(0 , (126) 

where £'s binary expansion 0.£i^2^3 ■ • • is the bitwise XOR of the binary expan- 
sions of x and y, that is, £ has zeroes in all positions except those where the 
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binary digits of x and y are different. We now define 



A N (x,y) = Yl W n (x)W n (y) 
n=0 
2 n+i_ 1 

= E w -(0 

n=0 

1 1 1 

= E E (-1) £2 " 2 ••• E ■ (127) 

ni— rt2- njv— 

Now, if n runs from to 2 N+1 — 1, all its binary digits n\ t 2,...Js[ will be and 1 
an equal number of times. Therefore, each factor in the sum ( |127[ ) will evaluate 
to zero unless its corresponding digit of £ is also zero. Hence, A^(x,y) is zero 
unless the binary digits of x and y agree up to and including the place, and 
in that case it will be 2 . Taking the limit N — > oo we again recover the Dirac 
delta distribution. As in the Fourier case, the way we take the limit turns out 
to be relevant. 

Appendix D: The function P s (n) 

In this appendix we discuss the function P s (n), which counts in how many ways 
an odd integer n can be written as a product of s integers (including ones): 

(128) 

«l,2,...,s>l 

Obviously, Pi{n) — 1, and we have the recursion 

Ps + i(n)=J2 p s(m)S nmodm<0 . (129) 

rn> 1 

Note that the fact that n is odd automatically restricts these sums to the odd 
integers only. The recursion ( |129| ) can trivially be implemented to give us P s (n) 
for very large values of s and n. In the table we give its first few values. Note the 
irregular behaviour of P s (n) with n: for n prime it equals s, while new maxima 
are encountered whenever n is a product of the first consecutive odd numbers, 
or a triple of that number. Then, it can become quite large: for instance, 
Pio(945) = 22000. We need, however, not worry about the convergence of series 
like those giving ip(z) and x(z); for, we have 



2n- l) x 



y 1 

^ (On. - 



(2n- l) x 



axy , (130) 



and sums like these are finite whenever x > 1. The function P s (n) without the 
restriction to odd integers is discussed, for instance, by Hardy and Wright H . 
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29 
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8 
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10 


43 
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2 
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5 


6 


7 


8 


9 


10 


45 


1 


6 


18 


40 


75 


126 


196 


288 


405 


550 



Table 3: Values of P s (n) for low values of s and n. 
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